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HIERARCHICAL CONCTRAINED AUTOMATIC 
LEARNING NETWORK FOR CHARACTER RECOGNITION 
Abstract 

Highly accurate, reliable optical character recognition is afforded by a 
5 hierarchically layered network having several layeis of parallel constrained feature 
detection for localized feature cxtracticHi followed by several fuDy connected layers 
for dimensionality reduction. Character classification is also performed in the 
ultimate fully connected layer. BaxA layer of parallel constrained feanire detection 
ccanpiiscs a plurality of constrained feature maps and a concsponding plurality of 
10 kernels wherein a predctcnnined kernel is directly related to a single constrained 
feature m^. Undersampling is performed from layer to layer. 
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HIERARCHICAL CONSTRAINED AUTOMATIC 
LEARNING NETWORK FOR CHARACTER RECOGNITION 



Technical Field 

This invention relates to Ac field of pattern recognition and, mart 
5 particulariy, to massively parallel, constrained netwoiks far optical charactw 
recognitioa. 

Background of the InvenHon 

Computation systems based upon adaptive learning with fine-grained 
parallel architecnnes have moved out of obscurity in recent years because of the 

10 growdi of conoputer-based infonnadon gathering, handling, man^uladon, storage, 
and transmissioa Many concepts applied in diese systems represent potentially 
efficient approaches to solving problems such as providing automatic recognition, 
analysis and classification of diaracter patterns in a particular image. Ultimately, the 
value of these techniques in such systems depends on then- effectiveness or accuracy 

IS relative to conventional approaches. 

In a recent article by Y. LeCun entided "Generalization and Network 
Design Strategies," appearing in Connecfionism in Perspective, pp. 143 - 155 
(Elsevier Science Publishers: North-Holland 1989X die andior describes five 
different layered network architectures applied to the problem of optical digit 

20 recognition. Learning in each of the networks was attempted on pixel images of 
handwritten digits via inherent classificatiwi intelligence acquired from die back- 
propagation technique described by D.Runoelhart ei al., Parallel Distributed 
Processing. VoLI, pp. 318-362 {BradfOTd Books: CamWdge, Mass. 1986). 
Complexity of die networks was shown to increase from a two bycr, fully connected 

25 network caUed Net-2 to a hierarchical networic called Net-5 having two levels of 
constrained feature maps for hierarchical feature extraction. The network Nel-2 was 
said to have a significanUy larger standard deviation in generalization performance 
than single layer, fully connected networics indicating, thereby, that die fwmcr 
network is largely undeicnnined widi a large number of solutions consistent with its 

30 training set But, as stated by LeCun, "[u]nfortunately, diese various solutions do 
not give equivalent results on die test set, thereby explaining die large variations in 
generalization performance ... it is quite clear diat die network is too big (or has too 
many degrees of freedom)." Performance of die most complex hierarchical networic. 
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that is, Nct-S, exceeded that of the lesser complex networks. Moreover, it was 
postulated tha the muldple levels of constrained feanue maps provided additional 
control for shift invaiiance. 

While the hierarchical network described above ^>pears to have 
5 advanced die art of solving the character recognition or classification problem, it is 
equally apparent that existing systems lack sufficient accuracy to permit realization 
of reliable automatic character recognition apparatus. 

Summary of the Invcntiwi 

Highly accurate, reliable c^tical character recognition is afforded by a 

10 hierarchically-laymd network having several layers of parallel constrained feature 
detection for localized feature extraction followed by several fully connected layers 
for dimensionality reduction. Character classification is also performed in the 
ultimate fully connected layer. Each layer of parallel constrained feature detection 
conqmses a plurality of constrained feature maps and a coiresponding plurality of 

15 konels wherein a predetermined kernel is directiy related to a single constrained 
feature map. Undersan^ling occurs from layer to layer. 

In an embodiment according to ibe princq>les of the invention, the 
hierarchical network coooprises two layers of constramed feature detection followed 
by two fiilly connected layers of dimensionality reduction. Each constrained feature 

20 map comprises a plurality of units. Units in each constrained feature map of the first 
constrained feature detection layer respond as a function of both the conesponding 
kernel for the constrained feanirt map and different portions of die pixel image of 
the character captured in a receptive field associated with die urut Units in each 
feanire map of die second constrained feature detection layer respond as a function 

25 of both the corresponding kernel for the constrained feature map and different 
portions of an individual constrained feature map or a combination of several 
constrained feature maps in the first constrained feature detection layer as captured 
in a receptive field of die unit. Feanire maps of the second constrained feanire 
detection layer are fully connected to each unit in die first dimensionality reduction 

30 layer. Units in tbt first dimensionality reduction layer are fiiUy connected to each 
unit of the second dimensionality reduction layer for final character classificatkm. 
Kernels are automatically learned by constrained back propagation during network 
initialization or training. 
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Benefits realized from this network aichitecturt are increased shift 
invariance and reduced entropy, Vapnik-Qiervonenlds diroensionality and number 
of free paranoeters. As a result of these inqnovements, the network experiences a 
prc^xmional reduction in the amount of training data and training time required to 
5 achieve a given level of generalization perfonnance. 

Brief Description of the Drawing 

A. more complete understanding of the invention may be obtained by 
reading the following description of specific illustrative embodiments of the 
invention in conjunction with the appended drawing in which: 
10 FIG. 1 is a sunplified block diagram for each individual computational 

elenxnt in the network; 

FIG. 2 is a simplified block diagram of an exemplary hierarchical 
constrained automatic learning network in accordance widi the principles of the 
invention; 

15 FIG. 3 is simplified diagram showing the connective relationship 

between units in a map at one level with a unit in a map at the next higher adjacent 
level; and 

FIGS. 4 through 19 are a collecti(»i of exemplary kernel representations 
utilized in the exemplary network of FIG. 2, 

20 Detailed Description 

Computational elements as shown in FIG. 1 form the fundamental 
functional and interconnectionist blocks for a hierarchical constrained network 
realized in accordance with the principles of the invention. In general, a 
computational element fmns a weighted sum of input values for n+l inputs and 

25 passes the result through a nonlinearity to arrive at a single value. The input and 
output values for the computational element may be analog, quasi-analog such as 
muld-lcvcl and gray scale* or Wnary in nature. Nonlincariries commonly employed 
in computational elements include hard lintiters, threshold logic elements, sigmoidal 
nonlinearitieSy and piecewise nonlinear approximations, for example. 

30 In operation, the computational element shown in FIG. 1 scans n 

neighboring input pixels, pixel values or unit values from an image or feanire map 
wherein the pixels, pixel values and unit values have values such as brightness levels 
represented as ai, a^ an. An input bias is supplied to the n+1 input of a 
computational element Tcft simplicity, the bias is generally set to a constant value 
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such as 1, The input values and the bias are supplied to multipliers 1-1 through 1- 
(n+1). The multipliers also accept input from a kernel having weights through 
vfj^\. Outputs from all multipliers are supplied to adder 2 which generates the 
weighted sum of the input values. As such, the output from adder 2 is simply the dot 
5 product of a vector of input values (including a bias value) with a vector representing 
the kernel of weights. The output value firom adder 2 is passed through nonlinear 
function in nonlinearity 3 to generate a single unit output value xj. As will be 
understood more cleariy below, unit output value Xi is rehued to the value of the i* 
unit in the feature map under consideration. 

10 In an example from experimental practice, an exemplary sigmoidal 

function for ncmlinearity 3 is chosen as a scaled hyperbolic tangent function, f(a>sA 
tanh Sa where a b the weighted sum input to the nonlinearity, A is the amplitude of 
the functicm, and S determines the slope of the function at the origiiL The exemplary 
nonlinearity is an odd function with horizontal asymptotes at -«-A and -A. It is 

15 understood that nonlinear functions exhibiting an odd symmetry are believed to yield 
faster convergence of the kernel weights W| through w^f i • 

Weights for each of die kernels in the hierarchical constrained network 
were obtained using a trial and error learning technique known as back propagation. 
See the Rumelhart et al. reference cited above <v see R. P. Lippmann, "An 

20 Introduction to Coii9)uting with Neural Nets", TERR ASSP Magazine. V<^ 4, No. 2, * 
pp. 4-22 (1987). Prior to training, each weight in the kernel is initialized to a 
random value using a uniform distribution between, for example, -2.4/Fi and 2.4/Fi 
where Fi is the number of Inputs (fan-in) of the unit to which the connection belongs. 
For the exan4>le shown in FIG. 1, the fan-in is equal to n+1. An exemplary output 

23 cost functicm is the weU known mean squared error function: 

p o * 

where P is Uie number of patterns, O is the number of output units, dop is the desired 
state for output unit o when pattern p is presented, and Xop is the state for output unit 
0 when pattern p is presented. By using this initialization technique, it is possible to 
30 maintain values within the operating range of the sigmoid nonlinearity. During 
training, image panems are presented in a constant order. Weights are updated 
according to the stochastic gradient or "on-line" procedure after each presentation of 
a single image pattern for recognition. A true gradient procedure may be employed 
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for updating so that averaging takes place over the entire training set before weights 
are updated. It is understood that the stochastic gradient is found to cause weights to 
converge faster than the true gradient especially for large, redundant image data 
bases. A variation of the Back-Ptopagation algoridun confutes a diagonal 
5 approximation the die Hessian matrix to set the learning rate (^timally. Such a 
"pseudo-Newton" procedure produces a reliable result without requiring extensive 
adjustments of parameters. See Y, LeCun, Modeles Conncxionnistcs de 
rApprentissaee, PhD Thesis, Universite Pierre et Marie Curie, Paris, Irance (1987). 

Standard techniques are employed to convert a handwritten character to 

10 the pixel airay which foims the supplied character image. The diaracter image may 
be obtained througjh electronic transmission fiom a renM>te location or it may be 
obtained locally witii a scanning camera or other scanning device. Regaidless of its 
source and in accmdance witii conventional practice, the character image is 
represented by an oidered collection of pixels. The oniered collection is typically an 

IS array. Once represented, the character image is generally captured and stmd in an 
optical memory device or an electronic memory device such as a frame buffer. 

Each pixel has a value associated therewith which corresponds to the 
light intensity or color or the like emanating from tiiat small area on the visual 
character image. Values of tiie pixels are then stored in the memocy devices. When 

20 reference is made to a particular map, it is understood tiiat the icnns "pixel" and 
"unit value(s)** are used interchangeably and include pixels, pixel values and unit 
values output from each coaptation element combining to form die map array. It 
may be more omvenient to think in terms of planes or 2-dimensional arrays (maps) 
of pixels radicr than pixel values cr unit values for visualizing and developing an 

23 understanding of network operation. 

In addition to visualizing pixel and unit values widi pixel intensity 
levels, it is also useful to visualize the array of weights in the kernel in this manner. 
Sec, for example, FIOs. 14 and 15, arranged according to die diagram in FIG. 13, 
which represent arrays of kernels learned during an experiment with the networic 

30 embodiment in FIO. 2. Also, by visualizing the kernel as an array, it is possible to 
understand more easily how and what the kernel affects in the pixel array undergoing 
feature extraction. 

Various other preprocessing techniques used to prepare a character 
image as a pixel array for character recognition may include various linear 

35 transformations such as scaling, size normalization, deskewing, centering, and 
translation or shifting, all of which are weU known to tiiose skUled in the art. In 



2032125 



-6- 

addidon, transfonnation from the handwritten character to a gray scale pixel array 
may be desirable to preserve information which would otherwise be irretrievably lost 
during preprocessing. The latter transformation is understood to be well known to 
those skilled in the art 
5 In addition to the operations listed above for preparation of the image 

for character recognition, it is generally desirable to provide a uniform, substantially 
constant level border around the original image. Such a border is shown in array 102 
wherein the array elements outside airay 101 in image 10 constinite the uniform 
border. In the example described below, the input to the network is a 16 by 16 

10 gray-scale image thai is formed by normalizing the raw image. The image is gray- 
scale rather than binary since a variable number of pixels in the raw image can fall 
into a given pbcel in the normalized image. 

Realization of the computational elements and, for that matter, the entire 
network may be in hardware or software or some convenient combination of 

15 hardware and software. Much of the netw«k presented herein has been 
implemented using a SUN workstation with simple programs performing the 
rudimentary mathematical operations of addition, subtraction, multiplication, and 
conparisoQ. Pipelined devices, micrcprocessors, and special purpose digital signal 
processon also provide convenient architecnires far realizing the network in 

20 accordance with the principles of die invention. MOS VLSI technology has also 
been enoployed to implement particular weighted interconnection networks of the 
type shown in FIG. 2. Local memory is desirable to store pad and unit values and 
other temporary ccmiputation results. 

HQ, 2 shows a simplified block diagram of an cxemplaiy embodiment 

25 for a hierarchical constrained automatic learning network in accordance wiUi the 
principles of the invention. The network performs character recognition from a 
supplied image via massively paraUel compuutions. Each array shown shown as a 
box in dje FIO. in layers 20 through 50 is understood to comprise a pluraUty of 
computational elements, one per array unit All of the connections in the network 

30 are adaptive, although bcavUy constrained, and are trained using Back-Propagation. 
In addition to the input and output layer, the network has three hidden layers 
respectively named layer 20, layer 30 and layer 40. Connections entering layer 20 
and layer 30 are local and are heavily constrained. 

The exemplary network shown in FIG. 2 comprises first and second 

35 feature detection layers and first and second dimensionaliiy reduction layere, wherein 
the latter dimensionality reduction layer is a character classification layer. Each 
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laycr comprises one or more feature maps or arrays of varying size. In ihost 
conventional applications, the maps are square. However, rectangular and other 
symmetric and non-symmetric or incgular m^ patterns are contemplated. The 
arrangement of detected features is referred to as a map because an array is 
5 constructed in the memory device where the pixels (unit values) are stored and 
feature detections from one lower level map arc placed in the appropriate locations 
in the array far that map. As such, the presence or substantial presence (using gray 
scale levels) of a feature and its relative location m thus recorded. 

The type of feanire detected in a map is detennined by the kernel being 

10 used It should be noted the kernel ccmtains the weights which multiply the pixel 
values of the image being scanned in the computatiOT elenoent In constrained 
feanxre maps, the same kernel is used for each unit of the same map. That is. a 
constrained feature map is a scan of a pixel array representing the non-occurrence or 
the occurrence of the particular feanire defined by the one associated kernel. As 

15 such, the term "constrained** is understood to convey the condition that computation 
elements conqirising a particular map axe forced to share the same set of kernel 
weights. This results in die same feature being detected at different locations in an 
input image. In other words, a constrained feature map provides a representation of 
the occurrence of the same feature localized in some matmer. It is understood that 

20 this techruque is also known as weight sharing. 

For those skilled in the art, it will be understood that die kernel defines a 
receptive field (e. g., 5 pixels x 5 pixels or 2 pixels x 2 pixels) on the plane of the 
image pixels or map units being detected for occurrence dte feature defined by the 
kernel By placement of the kernel on a pixel array, it is possible to show what 

25 pbcels arc being input to the computation element in the feature map and which unit 
on the feature map is being activated. The unit being activated corresponds 
generally to an approximate location of the feanire occurrence in the map under 
detection. 

The first feature detection layer includes a plurality of constrained 
30 feature maps 20. As shown in the figure, the particular embodiment of die network 
includes twelve each of die constrained feanirt maps. The second feanire detection 
layer includes a plurality of constrained feature maps 30. As shown in the figure, the 
particular embodiment of the network includes twelve each of the constrained 
feanire maps in the second layer. 
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The two upper layers of the network comprises dimensicHiality reduction 
layers 40 and SO wherein layer 50 is a character classification layer. Layer 40 is 
fully connected to all constrained feature maps of the second feature detection layer. 
The character classiiicadon layer is fully connected to all units in dimensionality 
5 reduction layer 40. Layer SO generates an indication of the character (alphabetic or 
nunwral) recognized by the networlc from the supplied mginal image. The tenn 
''fully connected" is understood to mean that the cooiputaticHi element associated 
with a pixel in layer 40 receives its input from every pixel <x unit included in the 
preceding hyer <^ m^ that is, layer 30. 

^0 Interconnection lines from layer to layer in the netwoik shown in FIG. 2 

have been drawn to show which maps in a preceding layer provide inputs to each and 
every computation element whose units form the maps in die next higher network 
layer of interest For example, constrained feature maps 201 through 212 detect 
different features from image 10 in the process of generatbg die constrained feature 

15 maps. Proceeding to the next level of maps, feanire rcducticm maps 301 tfirough 312 
derive their input from the units in combinations of eight different constrained 
feanire maps 201 through 212. Constrained feamre maps 301, 302 and 303 derive 
their inputs from oombinaticms of units in constrained feature oiaps 201, 202, 203, 
204, 209, 210, 211, and 212 using exemplary kernels from FIOs. 7-9; constrained 

20 feanire maps 304, 303, and 306 derive their inputs frtmi combinations of units from 
constrained feanire maps 203, 204, 205, 206, 209, 210, 211, and 212 using 
exemplary kernels from FIGs. 10-12; constrained feanire maps 307, 308, and 309 
derive their inputs from combinations of units from constrained feanire maps 205 
through 212. inchiavely, using excnoplary kernels from FIOs. 13-15; and constrained 

25 feature maps 310, 311, and 312 derive their inputs from combinations of units from 
constrained feamre maps 201, 202, and 207 througjh 212, inclusively, using 
exemplary kernels from FIGs. 16-19. Exemplary kernels used for weighting the 
interconnections between image 10 and layer 20 ait shown in FIGs. 4-6. 

Dimensionality reduction layer 40 includes more eleaients than are in 

30 the classification layer SO, As shown in FIG. 2 for an exemplary number recognition 
network, there are 30 units or elements shown in layer 40. It should be noted diat the 
character classification layer 50 includes a sufficient number of elements for the 
particular character recognition problem being solved by the network. That is, for 
the recognition of either upper case or lower case Latin alphabetic characters, one 

35 exemplary embodiment of layer 50 would include 26 units signifying the letters A 
through Z or a through z, respectively. On the other hand, for the recognition of 
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numeric charactent one embodiment of layer 50 would include only 10 units 
signifying the numbers 0 through 9, respectively. 

For convenience and ease of understanding, the bias input to the 
computational elenoent and its associated weight in the kernel shown in FIG. 1 have 
5 been omitted from FIGs. 3 through 19 and in the descripdcm herein. In experimental 
practice, the bias is set to 1 and its corresponding weight in the kernel is learned 
through back pn^gadon although the kernel denoent for the bias input is not shown 
in any of thel'IOs. 

Layer20 is composed of 12 groups of 64 units arranged as 12 

10 independent 8 by 8 feanire m^. These twelve feature maps will be designated as 
xDsp 201, map 202, map 212. Each unit in a feature ms^ takes input frxmi a 
S by S neighborhood on the input plane. For uidts in layer 20 that are one unit apart, 
their lecqitive fields (in the input layer) aie two pixels apart Thus, the input image 
is mdersampkd and scnne position infimnatira is eliminated in the process. A 

15 similar two-tOKmeundersampling occurs going from layer layer 20 to 1^ 

This design is nootivated by the consideration that high resolution may 
be needed to detect whether a feature of a certain shape appears in an image, while 
the exact position where that feature qipears need not be determined with equally 
high precision. It is also known that the types of features tiiat are important at one 

20 place in die image are likely to be inqxirtant in other places. 

Therefore, corresponding connections on each unit in a given feature 
map are constrained to have the same weights. In other words, all of the 64 units in 
hiyer 201 uses Ae same set of 25 weights. Each unit performs the same operation on 
conesponding parts of the image. The function perfbraoed by a feature map can thus 

25 be interpreted as a generalized convolution widi a 5 by 5 kernel 

Of course, units in anodier map (e. g., m^ 204) share another set of 25 
weights. It is worth mentioning diat units do not share tiieir biases (thresholds). 
Each unit dius has 25 input lines plus a bias. Connections extending past the 
boundaries of the input take their input firom a virtual back-ground plane whose state 

30 is equal to a constant, pre-deteimined background level, in our case -1. Thus, 
layer 20 coooprises 768 units (8 by 8 times 12), 19968 connections (768 times 26), 
but only 1068 free parameters (768 biases plus 25 times 12 feamre kernels) since 
many connections share the same weight 

LayerSO is also composed of 12 feanires maps. Each feamre map 

35 contains 16 units arranged in a 4 by 4 plane. As before, these feamre maps wiU be 
designated as map 301, map 302, map 312. The connection scheme between 
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laycr 20 and layer 30 is quite similar to the one between the input and layer 20, but 
slightly more complicated because layer 20 has multiple 2-D maps. Each unit in 
layer 30 combines local information coming from 8 of the 12 different feature maps 
in layer 20. Its recq)tivc field is craaposed of eight 5 by 5 neighbwhoods centered 
5 around units that are at identical positions within each of the eight tnapt. Thus, a 
unit in layer 30 has 200 inputs, 200 weights, and a bias. Of course, aU units in a 
given nsap are constrained to have identical weight vectc»s. The eight maps in 
layer 20 cm which a ooap in layer 30 takes its inputs are chosen according to the 
following scheme. There are four maps in die first hidden layer (namely layer 209 to 

10 layer 212) that are connected to all maps in the next layer and are expected to 
coDxpute coarsely-tuned features. Connections betwem the renmining eight maps 
and layer 30 are as shown in the FIGS. 7 through 19. The idea behind this scheme is 
to introduce a notion of functional contiguity between the eight maps. Because of 
this architecture, layer 30 units in consecutive maps receive similar error signals, and 

15 are ejqjected to perfOTD similar operations. As in the case of layer 20, connections 
falling off the boundaries of layer 30 maps take their input from a virtual plane 
whose state is constant equal to 0. To summarize, layer 30 contains 192 units 
(12 times 4 by 4) and there is total of 38592 connections between layers layer 20 and 
layer 30 (192 units times 201 input lines). All these connections are controlled by 

20 only 2592 free parameters (12 feature m^ times 200 weights plus 192 biases). 

Layer 40 has 30 units, and is fully connected to layer 30. Thenumberof 
connections between layer30 and laycr40 is thus 5790 (30times 192 plus 30 
biases). The output layer has 10 units and is also fully connected to layer 40, adding 
anotficr 310 weights. The network has 1256 units, 64660 connections and 9760 

25 indq)endent parameters. 

FIO. 3 shows sample interconnections and feature extraction and 
detection from image 10 to constrained feanire m^ 201. Unit 210 in map 201 
observes a 5 x 5 neighborhood on the input image plane and uses weights from an 
exemplary kernel 221 in FIO. 5 to develop the value of unit 210. The gray scale unit 

30 value shows the presence, substantial presence, substantial absence, or absence of 
that feature in the input inoage in that neighborhood The function performed by 
each conn)utadonal element in the constrained feanire map is interpreted as a 
nonlinear convolution of a 5 x 5 receptive field of image pixeb or units with a 5 x 5 
kernel For units (computation elements) that are one unit apart in map 201, their 

35 receptive fiekls in the input image layer are two pixels apart Otiicr units in 
constrained feanire map 201 use the same kernel as used by unit 210. Other maps in 
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layer 20 include units which operate on the image in a manner identical to map 201 
uang different kernels ftom that shown in FIG. 3. Sec FIGs. 5 and 6 for dififercnccs 
in cxtmplary kernels for the associated constrained feanne maps in layer 20, 

As shown in FIG. 3, image 10 includes a 16 x 16 array 101 comprising 
5 an image of the original character surrounded by a constant-vahied border which is 2 
pixels wide resulting in a 18 x 18 image anay 102. Constrained feature map 201 is 
shown as a 8 X 8 anay. 

Interconnection from constrained feature maps in layer 20 to units in 
constrained feature maps of layer 30 are not shown because of complexity of the 
10 drawing. The interconnections are similar to the one shown in FIG. 3 with the 
addition of interconnections from other feature maps to determine a spcd&Q unit 
value.. FunctionaDy, this interconnection is a nonlinear ccmvolution with several 5 x 
5 kernel (see HGS. 5 through 19). All other interconnections between the first and 
second feature detection layers result in a nonlinear convolution using a composite ^ 
15 kernel or two separate kernels (such as two 5 x S kernels) cm a composite array of 
units from similar (c. g., 5 x 5) receptive fields on eight difiFcrent feature reduction 
maps. As contemplated for the network shown in FIG. 2, nsaps 301 through 312 are 
12 X 12 arrays. 

FIG. 4 tiirough 19 show an cxenaplary set of kernels learned far the 
20 network shown in HG. 2. Hie kernels are used by the conq>utational elements for 
constrained feature maps in the first and second feature detection layers. Increased 
brightness levels for the individual squares indicate more positive analog (gray level) 
values for the weights in the kcmeL Increased darkness levels for the individual 
squares indicate more negative analog (gray level) values for die weights in the 
25 kernel Kernels 221 through 232 are used on image 10 to produce constrained 
feature maps 201 Arough 212, respectively. 

For the exemplary networic embodiment shown in FIGs. 2 through 19, it 
has been estimated that tiiere are approximately 65,000 connections and only 
approximately 10,000 firec parameters. It should be noted that the network 
30 arehitecture and constraints on the weights have been designed to incorporate 
sufficient knowledge of die geotnetric topology of the recognition task. 

It should be clear to those skilled in the art that constrained feature map 
sizes, dimensionality reduction layer siies, receptive fields, kernel sizes and array 
sizes may be changed without departing from die spirit and scope of this invention. 
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Moreover, it should also be clear to those sIdUed in the art that other sets of 
alphabetic and alphanumeric chai«:t«, can be itcognized with only slight 
adjustments to the network architecnne. 
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Claims: 

L A massively parallel confutation network for recognition of a 
character included in an image map, said network including a fint constrained 
feature detection layer fcK' extracting features from said image map and for 
5 undersampling said image, a second constrained feature detection layer for 
extracting features from said first constrained feature detection layer and for 
undersampling said first feature detection layer, first dinoensionality reduction layer 
substantially, fully connected to and responsive to said second constrained feature 
detection layer, and second dimensionality reduction layer substantially fuUy 
10 connected to and responsive to said first dimensionality reduction layer for 
classifying the character recognized by the network and generating an indication 
representative of the character recognized by the network. 

2. The computation network defined in claim 1 wherein said image map 
includes a substantially constant predetermined background surrounding an cniginal 

15 character image. 

3. The computation networic defined in claim 1 wherein said first 
constrained feature detection layer includes M groups of m units arranged as 
independent feature maps and said second constrained feature detection layer 
includes N groups of n units arranged as independent feanire maps, and M, N, m, 

20 andn are positive integen where M^Nandm^n. 

4« The computation network defined in claim 3 wherein N and M are 

equal 

5. The computation networic as defined in claim 3 wherein said first 
dimensionality reduction layer comprises L groups of one unit each, said second 
25 dimensionality reduction layer comprises K groups of one unit each, where K and L 
are positive integera and K is greater than N and less than L 
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6# The computation network defined in claim S wherein N and M art 

equal. 



?• The computation network defined in claim 3 wherein substantially 
each unit has associated therewith a contsponding computational dement for 
5 generating a value for the associated unit, each said computational element having a 
weighting kantl associated dicrewith and being responsive to a plurality of 
substantially ndghbraing units from a least a predetermined other layer for mapping 
a dot product of said associated weighting kernel with said predetermined plurality 
of substantially neighbcmng units into an output value in accordance with a selected 

10 nonlinear criterion, each said ccHnputatim element le^nstve to a different plurality 
of substantially neighboring units than each other computation element associated 
with the sanoe map, said first constrained feature detection layer responsive to image . 
units, said second cmstrained feature detecticm layer responrive to units from at 
least one feature map in said first cmstrained feature detection layer, each unit in 

IS said first dimensionality reduction layer responsive to substantially every unit in said 
second ccHistrained feature detecticm layer representative of the character recognized 
by the network, and each unit in said second dfanendonality reducticm layer 
responsive to substantially every unit in said first dimensionality reducticm layer. 

& The conoputation network defined in claim 7 wherein the selected 
20 nonlinear criterion includes a sigmoidal functicm. 

9. The computation network defined in claim 7 wherein tht selected 
nonlinear criterion includes a piecewise nonlinear function. 

10. The computation network defined in claim 7 wherein N and M are 

equal. 



11. The computation network as defined in claim 7 wherein said fint 
dimensionality reduction layer comprises L groups of one unit each, said second 
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dimensionalky redaction layer conqsrises K groups of one unit each, where K and L 
are positive integets and K is greater than N and less than L. 

12. The computation network defined in claim 11 wherein N and M are 

equal 

5 13. The computaticm network defined in claim 12 wherein the selected 

nonlinear criterion includes a sigmoidal function. 

14. The computation network defined in claim 12 wherein the selected 
nonlinear criterion includes a piecewise nonlinear function. 
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